MQE: track number of processed samples in each query #10232

charleskorn · 2024-12-13T00:09:47Z

What this PR does

This PR adds support for tracking the number of samples processed in a query evaluated by MQE.

Which issue(s) this PR fixes or relates to

Resolves #10138

Checklist

Tests updated.
[n/a] Documentation added.
[covered by Mimir Query Engine #10067] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
[n/a] about-versioning.md updated with experimental features.

tinitiuset

Thank you for being so quick on this. Looks good to me.

pkg/streamingpromql/operators/selectors/instant_vector_selector.go

jhesketh · 2024-12-16T12:05:50Z

pkg/streamingpromql/types/stats.go

+// QueryStats tracks statistics about the execution of a single query.
+//
+// It is not safe to use this type from multiple goroutines simultaneously.
+type QueryStats struct {


Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?

It would also make it easier to implement TotalSamplesPerStep if we want to support that too (which I think we do)

Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?

Given we don't support anything other than TotalSamples in MQE, I wanted to make this clear in the code by using a struct that only had a field for TotalSamples.

It would also make it easier to implement TotalSamplesPerStep if we want to support that too (which I think we do)

I don't think we want to do this unless there's a specific need for it - per-step stats are considered experimental and disabled by default in Prometheus, and are not possible to enable in Mimir as far as I can see. The docs for this also state that the value for each step should be the same as if the query was run as an instant query, so anyone who wanted this information could run the query as an instant query for the step(s) they're interested in.

Why create our own struct instead of using stats.QuerySamples and passing that to the appropriate operators?

Given we don't support anything other than TotalSamples in MQE, I wanted to make this clear in the code by using a struct that only had a field for TotalSamples.

I would be happy with a comment in query.go, but I'm not opposed to a separate struct.

It would also make it easier to implement TotalSamplesPerStep if we want to support that too (which I think we do)

I don't think we want to do this unless there's a specific need for it - per-step stats are considered experimental and disabled by default in Prometheus, and are not possible to enable in Mimir as far as I can see. The docs for this also state that the value for each step should be the same as if the query was run as an instant query, so anyone who wanted this information could run the query as an instant query for the step(s) they're interested in.

Fair enough, but also thinking of any future stats. I don't mind a separate struct though.

pkg/streamingpromql/engine_test.go

jhesketh · 2024-12-17T11:05:51Z

pkg/streamingpromql/engine_test.go

+			require.Equal(t, testCase.expectedTotalSamples, prometheusCount, "invalid test case: expected samples does not match value from Prometheus' engine")
+
+			mimirCount := runQueryAndGetTotalSamples(t, mimirEngine, testCase.expr, testCase.isInstantQuery)
+			require.Equal(t, testCase.expectedTotalSamples, mimirCount)


We can also compare the samples loaded as part of our test gauntlet if we expect it to be the same in all cases

This is currently difficult due to the optimisation in prometheus/prometheus#14097, as Prometheus' engine sometimes skips loading data for histograms if it's not needed. MQE does not yet have the same optimisation, so there are some expected differences in the total sample count from the two engines in some cases.

Given the tests in TestQueryStats, and the fact the statistics are informational and may differ between engines in the future due to other optimisations, I'm tempted to leave this as-is.

Thoughts?

Happy to leave it out if that's the case.

It might be interesting to see some much larger queries/time ranges etc to see if we are returning consistent results. We could perhaps use the same data generated from the benchmarks to create some large queries (of just floats since NH will be different). Then have a flag to RequireEqualResults to compare them etc.

This isn't a blocker.

…and NH's

jhesketh

I don't really like how we have to iterate through the buffer vs counting the samples on filling the buffer, but otherwise lgtm. Is it possible to see a benchmark of range-vectors of histograms before approving?

charleskorn · 2025-01-08T00:41:23Z

I don't really like how we have to iterate through the buffer vs counting the samples on filling the buffer, but otherwise lgtm. Is it possible to see a benchmark of range-vectors of histograms before approving?

Benchmarks show there is no statistically significant difference with this PR compared to main - the extra work is insignificant compared to everything else that needs to be done.

jhesketh

Thanks for checking 🚀

charleskorn mentioned this pull request Dec 11, 2024

Mimir Query Engine #10067

Open

MQE: track number of processed samples in each query

344bfc7

charleskorn force-pushed the charleskorn/read-samples-tracking branch from 24164eb to 344bfc7 Compare December 13, 2024 00:36

charleskorn marked this pull request as ready for review December 13, 2024 00:55

charleskorn requested a review from a team as a code owner December 13, 2024 00:55

tinitiuset approved these changes Dec 13, 2024

View reviewed changes

jhesketh reviewed Dec 16, 2024

View reviewed changes

jhesketh reviewed Dec 17, 2024

View reviewed changes

Updated how NH are counted to samples, update testing to check NaN's …

5dea7f9

…and NH's

tinitiuset force-pushed the charleskorn/read-samples-tracking branch from 56ffb11 to 5dea7f9 Compare December 18, 2024 11:04

charleskorn added 3 commits January 6, 2025 14:00

Address PR feedback and reduce duplication

31cefce

Address PR feedback: add test case for stale markers

14a0b7c

Clarify variable name in runMixedMetricsTests

2d302f7

jhesketh reviewed Jan 8, 2025

View reviewed changes

jhesketh approved these changes Jan 8, 2025

View reviewed changes

charleskorn merged commit 4ec3018 into main Jan 8, 2025
29 checks passed

charleskorn deleted the charleskorn/read-samples-tracking branch January 8, 2025 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MQE: track number of processed samples in each query #10232

MQE: track number of processed samples in each query #10232

charleskorn commented Dec 13, 2024

tinitiuset left a comment

jhesketh Dec 16, 2024

charleskorn Jan 6, 2025

jhesketh Jan 8, 2025

jhesketh Dec 17, 2024

charleskorn Jan 6, 2025

jhesketh Jan 8, 2025

jhesketh left a comment

charleskorn commented Jan 8, 2025

jhesketh left a comment

MQE: track number of processed samples in each query #10232

MQE: track number of processed samples in each query #10232

Conversation

charleskorn commented Dec 13, 2024

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

tinitiuset left a comment

Choose a reason for hiding this comment

jhesketh Dec 16, 2024

Choose a reason for hiding this comment

charleskorn Jan 6, 2025

Choose a reason for hiding this comment

jhesketh Jan 8, 2025

Choose a reason for hiding this comment

jhesketh Dec 17, 2024

Choose a reason for hiding this comment

charleskorn Jan 6, 2025

Choose a reason for hiding this comment

jhesketh Jan 8, 2025

Choose a reason for hiding this comment

jhesketh left a comment

Choose a reason for hiding this comment

charleskorn commented Jan 8, 2025

jhesketh left a comment

Choose a reason for hiding this comment